Skip to main content

Querying `Series`

A Pandas Series can be queried either by index position or index label. If an index is not specified during querying, the position and the label are effectively the same values.

Querying by Index Position

To query by numeric location (starting at zero), use the iloc attribute.

import pandas as pd
students_classes = {'Alice': 'Physics',
'Jack': 'Chemistry',
'Molly': 'English',
'Sam': 'History'}
s = pd.Series(students_classes)

# Querying the fourth entry using iloc
s.iloc[3]

Querying by Index Label

To query by the index label, use the loc attribute.

# Querying the class Molly is taking using loc
s.loc['Molly']

Using the Indexing Operator

Pandas provides a smart syntax using the indexing operator directly on the Series.

# Querying by integer parameter (acts like iloc)
s[3]

# Querying by object parameter (acts like loc)
s['Molly']

Indexing with Integer Labels

If the index is a list of integers, be explicit by using the iloc or loc attributes to avoid confusion.

class_code = {99: 'Physics',
100: 'Chemistry',
101: 'English',
102: 'History'}
s = pd.Series(class_code)

# This will result in a KeyError
s[0]

# Correct way to query the first item
s.iloc[0]

Working with Series Data

Iterating Over Series

A common task is to perform operations on all values in a Series.

grades = pd.Series([90, 80, 70, 60])
total = 0
for grade in grades:
total += grade
print(total / len(grades))

Vectorization

Vectorization allows for efficient computation using the NumPy library.

import numpy as np

# Using numpy's sum method
total = np.sum(grades)
print(total / len(grades))

Performance Comparison

Using the Jupyter Notebook's timeit magic function to compare performance.

numbers = pd.Series(np.random.randint(0, 1000, 10000))

# Iterative approach
%%timeit -n 100
total = 0
for number in numbers:
total += number
total / len(numbers)

# Vectorized approach
%%timeit -n 100
total = np.sum(numbers)
total / len(numbers)

Broadcasting

Broadcasting applies an operation to every value in the Series.

numbers.head()
numbers += 2
numbers.head()

Iterating with iteritems

Iterating through a Series using iteritems.

for label, value in numbers.iteritems():
numbers.iat[label] = value + 2
numbers.head()

Performance Comparison with Broadcasting

# Iterative approach
%%timeit -n 10
s = pd.Series(np.random.randint(0, 1000, 1000))
for label, value in s.iteritems():
s.loc[label] = value + 2

# Broadcasting approach
%%timeit -n 10
s = pd.Series(np.random.randint(0, 1000, 1000))
s += 2

Modifying Series with loc

The loc attribute can modify data in place or add new data.

s = pd.Series([1, 2, 3])
s.loc['History'] = 102
s

Merging Series

Creating Series with non-unique index values and merging them.

students_classes = pd.Series({'Alice': 'Physics',
'Jack': 'Chemistry',
'Molly': 'English',
'Sam': 'History'})
kelly_classes = pd.Series(['Philosophy', 'Arts', 'Math'], index=['Kelly', 'Kelly', 'Kelly'])

# Merging the Series
all_students_classes = students_classes.append(kelly_classes)
all_students_classes

Considerations with Append

  • The append method returns a new Series.
  • It tries to infer the best data types.
students_classes
all_students_classes.loc['Kelly']